Diagnostic to distinguish bacterial infections

ABSTRACT

Assays, arrays, and methods for distinguishing a bacterial infection from a viral infection are disclosed. The antibiotic crisis is in part driven by over prescription of antibiotics. There is a tendency, particular in pediatrics, to give an antibiotic even for viral infections. Thus, embodiments herein are directed to the problem of distinguishing a bacterial infection from a viral infection to reduce unnecessary antibiotic usage.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.16/896,117, filed Jun. 8, 2020, which is a continuation of U.S.application Ser. No. 16/480,887, filed Jul. 25, 2019, and issued on Jul.14, 2020 as U.S. Pat. No. 10,712,342, which represents the nationalstage entry of PCT International Application No. PCT/US2018/016185,filed on Jan. 31, 2018, and is based on, and claims a priority benefitfrom U.S. Provisional Patent Application No. 62/452,825, filed Jan. 31,2017, and entitled “Diagnostic to Distinguish Bacterial Infections,”each of which is incorporated herein by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under HSHQDC-15-C-B0008awarded by the Department of Homeland Security, Science and Technology.The government has certain rights in the invention.

REFERENCE TO A SEQUENCE LISTING SUBMITTED VIA EFS-WEB

The content of the ASCII text file of the sequence listing named“Sequence_Listing_CALV_036C2.txt” which is 1.38 kb in size was createdon May 31, 2022 and electronically submitted via EFS-Web herewith theapplication is incorporated herein by reference in its entirety.

BACKGROUND

Antibiotic resistance is a global problem mainly due to the overuse ofantibiotics in clinical settings. Overuse is mainly caused by the lackof accurate diagnosis that can distinguish bacterial infections fromother types of infections. This is especially true for respiratory tractinfections and pediatric sepsis. More accurate diagnosis at the time ofan initial clinical visit that can distinguish bacterial from otherinfections would greatly curb the antibiotic overuse problem.

A major advance in stemming the antibiotic crisis would be to have adiagnostic that could readily distinguish a bacterial from viralinfection on presentation with symptoms. This would decrease theunnecessary use of antibiotics while still allowing their applicationoptimally for bacterial infections. Current research on distinguishingbacterial from viral infections has mostly been focusing on genome-wideexpressions (GWAS). The notion is that gene expression will change uponinfections of different pathogens. However, a serological test detectionmethod for pathogens is antibody response. There are many complicatingfactors that make analysis of antibodies between viral and bacterialinfections complex—one of the most important is the study platform.

SUMMARY

Embodiments of the current disclosure describe an array and methods fordistinguishing a bacterial infection from a viral infection. In certainembodiments, the array comprises two peptides, which first peptidecomprises SEQ ID NO: 1 and said second peptide comprises SEQ ID NO: 2.Further, the first peptide comprises a motif able to be bound to aplurality of bacterial specific antibodies, wherein the motif comprisesSEQ ID NO: 3 and the second peptide comprises a first motif and a secondmotif able to be bound to a plurality of bacterial specific antibodies,wherein the first motif comprises SEQ ID NO: 4 and the second motifcomprises SEQ ID NO: 5.

In certain embodiments, a method to distinguish a bacterial infectionfrom a viral infection is disclosed. The method contains the steps ofcontacting an antibody-containing sample with an array of immobilizedpeptides, wherein said peptides are selected from a group consisting ofone or more peptides that bind to antibodies produced in response to abacterial infection and one or more peptides that bind to antibodiesproduced in response to a viral infection; and detecting binding of anantibody from said sample with a peptide from said group.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be better understood from a reading ofthe following detailed description taken in conjunction with thedrawings in which like reference designators are used to designate likeelements, and in which:

FIG. 1. Hierarchical clustering for the correlation of the wholeimmunosignature by type of infection shows potential classification ofbacterial versus viral infection. Correlation is calculated for eachpair-wise sample comparison, then the samples that belong to the sameclass are averaged to a single correlation value. The clustering tableshows most viruses can be distinguished from the bacteria, with theexception of flu.

FIG. 2, Panels A-D: Performance of distinguishing bacterial versus viralinfection. Panel A: PCA analysis on the selected peptides shows onefactor is responsible for most variability, test set samples arehighlighted in the right figure. Panel B: Clustering of the selectedpeptides shows most peptides are bacteria specific peptides. Panel C:Performance of the classification algorithms. Panel D: Two selectedpeptides can achieve similar performance of classification.

FIG. 3, Panels A-C: Performance of distinguishing bacterial versus viraland other types of infection. Panel A: PCA analysis on the selectedpeptides shows one factor is responsible for most variability, test setsamples are highlighted in the right figure. Panel B: Clustering of theselected peptides shows most peptides are bacteria specific peptides.Panel C: Performance of the classification algorithms.

FIG. 4. Hierarchical clustering for the correlation of the wholeimmunosignature by type of infection including all classes. Non-infectedclass is more similar to bacterial infection, while the non-bacterialand non-viral infections are spread out in groups.

FIG. 5. Scatterplot of the 2 selected peptides. Color is true class. Allsamples are included in this figure. Both peptides are bacteria specificpeptides.

FIG. 6. Hierarchical clustering for the correlation of the wholeimmunosignature of each sample within bacterial and viral infections.More virus samples are misclassified as bacteria and mostly areinfluenza samples. Specificity for virus is nearly 100%.

FIG. 7. Hierarchical clustering for the correlation of the wholeimmunosignature of each sample within bacterial and viral infections.More virus samples are misclassified as bacteria and mostly areinfluenza samples. Specificity for virus is nearly 100%.

FIG. 8. Probability graph for being virus using Neural Network method inbacteria vs viral infection experiment. Color is true label. All samplesare included in this figure. Graph shows good separation between the twogroups.

DETAILED DESCRIPTION

Embodiments of the disclosure are described in preferred embodiments inthe following description with reference to the Figures, in which likenumbers represent the same or similar elements. Reference throughoutthis specification to “one embodiment,” “an embodiment,” or similarlanguage means that a particular feature, structure, or characteristicdescribed in connection with the embodiment is included in at least oneembodiment of the present invention. Thus, appearances of the phrases“in one embodiment,” “in an embodiment,” and similar language throughoutthis specification may, but do not necessarily, all refer to the sameembodiment.

The described features, structures, or characteristics of the inventionmay be combined in any suitable manner in one or more embodiments. Inthe following description, numerous specific details are recited toprovide a thorough understanding of embodiments of the invention. Oneskilled in the relevant art will recognize, however, that the inventionmay be practiced without one or more of the specific details, or withother methods, components, materials, and so forth. In other instances,well-known structures, materials, or operations are not shown ordescribed in detail to avoid obscuring aspects of the invention.

Immunosignature is a peptide microarray that derives peptide sequencesfrom random space rather than biological sequence space. The analysis ofsemi-random sequences allows for a mostly unbiased search for antibodiesthat may display a common binding motif. The applicant would not focuson sequences for any given pathogen, which allows exploring more broadlyfor antibodies that may fall into a pattern that overlaps bacteria andvirus.

The immunosignatures of 40 different types of pathogens were examined.Each type of infection has a signature that distinguishes it from peoplewithout the infection and from other types of infection. Immunosignatureby measuring the antibody response against pathogens, can distinguishbacterial from viral infections. Further, the applicant was able toidentify 2 peptides that can distinguish the two classes, which wouldyield a biomarker with more clinical utility. Further, immunosignaturecan distinguish bacterial from generally a non-bacterial infection,which also has clinical relevance, since there are always non-bacterialand non-viral infections present in clinical settings.

Pooled materials were examined for common immunosignatures between 4different bacteria and 5 different viruses, in total 280 samples.Immunosignatures (IMS) are patterns of antibody binding on 125,000peptide feature chips. The peptides are chosen from random peptidesequence space to maximize chemical diversity for discriminatingantibody binding. Immunosignatures have been demonstrated to readilydistinguish different types of infections and chronic diseases. Atraining set of x samples of bacteria, viral and non-infected were usedto establish the signature. The training set was validated on anindependent set of samples and then tested on another completelyindependent set. A set of 1000 peptides were identified that could makethe distinction with X specificity and Y sensitivity in the test set.Most of the misclassifications were influenza samples called asbacterial infections. To further explore the limits of IMS we includedsamples from 3 eukaryotic pathogens. While the ability to distinguishall 3 classes decreased in accuracy, the accuracy of distinguishingbacterial from virus and eukaryotic pathogen increased. To examine theissue of whether a lesser number of peptides could distinguish bacterialfrom viral infection, each peptide was tested for performance. Wedetermined that two peptides performed as well as the 1000 in making thebacteria versus virus call. These two peptides contained motifs thatwere common in bacterial proteomes. We used the natural bacterialpeptides in a simple spot assay to test bacterial and viral sera samplesand demonstrated preliminary distinction of the samples.

The sera samples used are listed in Table 1. The term “sample” includesany biological specimen obtained from an individual. Suitable samplesfor use in the present invention include, without limitation, wholeblood, plasma, serum, saliva, urine, stool (i.e., feces), tears, and anyother bodily fluid, or a tissue sample (i.e., biopsy) such as a smallintestine or colon sample, and cellular extracts thereof (e.g., redblood cellular extract). In a preferred embodiment, the sample is ablood, plasma, or serum sample. In a more preferred embodiment, thesample is a serum sample. They represent a wide range of bacterial andvirus species. There were between 9-22 sera samples from each type ofpathogen. Each sample was run in duplicate on the standard CIMV7 arrayscontaining 125K peptides. The process has been described, but briefly,involves diluting the sample 600× in buffer, applying it to the arrayfor one hour, washing and then detecting the pattern of antibody bindingwith a labeled secondary. In the assays reported here, IgG was detected.

TABLE 1 Sample information used in this application. 12 classes ofinfections are included in addition to a group of non-infectedindividuals coded as normal. Class of Type of Sample Count per infectioninfection number class non-infected non-infected 62 62 Bacteria Borrelia9 64 Bacteria Lyme 13 Bacteria Syphilis 22 Bacteria Tuberculosis 20Virus Dengue 22 105 Virus Flu 22 Virus Hepatitis B 20 Virus HIV 21 VirusWNV 20 Other Chagas 19 57 Other Malaria 17 Other Valley Fever 21

If the immune system responds to bacterial and viral infectionsdifferently, a high correlation for the immune responses within eachgroup and low correlation between them is expected. Correlation iscalculated for each pair-wise sample comparison with the 125 k featureson the immunosignature, then the samples belong to the same class areaveraged to a single correlation value (FIG. 1). For example,correlations for all comparisons between any Dengue samples versus anyWNV samples are averaged into a single value, representing the averagecorrelation between the two groups. Hierarchical clustering results showbacteria and virus are separated, with the only exception of influenzavirus, which is classified with bacteria. FIG. 1 demonstrates theinitial unsupervised division showing that influenza virus is the solemisclassified group, classified with bacteria. Given the fact that thereis a lot of noise in the immune system and numerous irrelevantantibodies circulating in the blood, this performance exceedsexpectation and confirms the hypothesis that the immune system is ableto distinguish bacterial and viral infections by producing differentantibodies. Non-infected samples and non-bacterial non-viral pathogensare mixed when including them in the correlation table (FIG. 4).

A further breakdown per samples is shown in FIG. 6. Hierarchicalclustering using the correlations for every sample (no sample isaveraged) is shown in FIG. 6. The specificity for viral infections isclose to 100%, with some viruses being classified as bacteria, mostlyinfluenza. This result is consistent with the class level clusteringresult.

Since samples in one class are merged, performance of each sample isdetermined. Hierarchical clustering for the correlation table for everysample is shown in FIG. 3 panel A. And the specificity for virus isclose to 100%, with some viruses being classified as bacteria, mostlyinfluenza viruses. This result is consistent with the per-classclustering result.

EXAMPLES Example 1—Building Bacterial Versus Viral Infection Classifierthat Shows Robust Distinction

Once the viability of distinguishing the two types of infections wasconfirmed, we utilized machine learning techniques to classify thesamples. In this experiment, only bacterial and viral infection samplesare used, with a total of 157 samples. Experimental workflow is outlinedin FIG. 7. All samples are randomly divided into training, validationand held-out test set, with a ratio of 60%, 20%, 20%. Training andvalidation sets are used to build the classifier. Test sets remainsuntouched until the final model is constructed and used only forevaluation.

Since there are 125,000 features on immunosignature platform, it isplausible to first do feature selection to find the most useful peptidesand remove noise. Feature selection is performed using training andvalidation set data via two-tail t-test for every peptide and top 1000significant peptides are used. Note that the general cutoff is eitherselecting top 1000 peptides or p-value<1/125,000, controlling overallfalse positive sample to be less than 1. Whichever cutoff has smallerpeptide numbers is used in the experiment. For tests we performed, thep-values are much lower than 1/125,000. As a result, a common cutoff ofthe top 1000 peptides is used throughout the application. In certainembodiments, the diagnostic peptides are chosen against a background ofnon-diagnostic peptides on the arrays. In order to provide thenon-specific binding buffer to increase specificity, the diagnosticpeptides can be arrayed with a set of 100-10,000 random peptides. Thesecould be mixed or individual and spotted separately or as a mixture.

Using the selected features, Principle Component Analysis (PCA) isperformed to determine how many components are responsible for themajority of the variability (FIG. 2, panel A). Interestingly, it wasfound that component 1 alone explains over 60% of the variability,indicating at least one factor is strongly driving the variance acrossgroups, at least for the selected features. The test samples are notused in feature selection, however, when analyzed with PCA (highlightedin FIG. 2, panel A) the test set samples are well separated at thevalidation set would suggest, suggesting overfitting is negligible.Hierarchical clustering is performed using the selected features tovisualize the data (FIG. 2, panel B). As we can see most peptides arerelatively higher in intensity in bacterial than viral infections. Thissuggests the peptides that are being selected are from antibody responsewere raised to the bacterial infection. The test set samples are alsohighlighted in the clustering heatmap to show their clustering grouplocation compared with the training and validation set. No obviousoverfitting is noted as test set samples are generally clustered in theright class.

Machine learning classifiers like Random Forest and Neural Networks areused to build the model of classification between the two groups. Foreach classifier, a model is trained using training data and a validationset is used to fine-tune the model and gain an initial performanceevaluation to limit overfitting. After the established model is used onthe test set, we perform a final performance evaluation on thisindependent dataset. Experiments with training group only usuallyresults in overfitting because the classifier might adjust to the randomvariations in the training group to gain best fit scores. Validation setonly also pose the same issue because the model is generated withinformation from the validation dataset. In microarray studies, thereare inevitably more variables than observations, overfitting becomesmore pronounced. Independent datasets are needed to test the performanceof the classifier, the test set data are never used in feature selectionto model generating and is only used for the final evaluation of themodel.

As it is shown in FIG. 2, panel C, Random Forest and Neural Networksboth have minimal misclassification rate on both training andvalidation. The final performance on the test set is also similar forboth classifiers. Random Forest tends to exhibit less sensitivity to thebacterial infections (sensitivity at 0.58) but is extremely specific(0.95). This is a bias toward true negatives as the cost of lower truepositives. Neural Network models yielded more balance for TP and FPbetween the two groups, with sensitivity and specificity at 0.83 and0.84 respectively (FIG. 8). Both models yield misclassification rates ofless than 20%. Since it may be that up to 60% of human infections arefrom virus, if doctors can distinguish viral infections from bacterialinfections, the use of antibiotics could be reduced by over 50%.

Stepwise regression is utilized to find the optimal, non-redundantpeptides that can be used to fit the model. Each peptide has to meet ap-value cutoff of 0.1 to enter the model and will exit the model uponthe exceeding the existing cutoff p-value of 0.1. Regression is startedassuming all peptides are out of the model. The whole process isiterated until the model is stabilized, meaning, i.e., no peptides leaveor enter the model. Then the model is fine-tuned to maximize RSquare forthe Validation set (FIG. 2, panel D). The final model only includes twopeptides, GALSRSFANVSFPGVAG (SEQ ID NO: 1) and GLSNGASSFGKASGVAL (SEQ IDNO: 2) (FIG. 5). Specificity and sensitivity for the test set comes to0.75 and 0.89, only marginally worse than the complete models using all1000 peptides. And the misclassification rate is at 0.16, no worse thanthe complete models.

Upon performing blast search on these 2 peptides against the RefSeqdatabase excluding Homo sapiens, Models (XM/XP) andUncultured/environmental sample sequences, they were found to be highlyenriched in bacteria but not in virus (FIG. 4). Furthermore, they areprevalent in all types of bacteria and all types of proteins, suggestingthey are indeed good bacterial infection detection molecules/biomarkers.

TABLE 2 Performance of bacterial vs non-bacterial infectionclassification using 5 selected peptides. Peptides are selected fromstepwise regression using mixed p-value model at cutoff of 0.1. Logisticfit is then performed using the selected peptides. Test set performanceis much lower compared with the complete model using all selectedpeptides from T-Test. logistic Fit Training Validation Test Sample size127 42 43 Misclassification rate 0.06 0.14 0.23 sensitivity(Bacteria)0.89 0.58 0.45 Specificity(Bacteria) 0.96 0.97 0.875

Example 2—Epitopes of Bacteria were Identified Via Blast Search of theTwo Peptides Followed by Ungapped Motif Mapping

Epitopes within the two peptides were examined. The two peptides maycontain bacterial epitopes or mimotopes that enhance bacteria-specificantibody binding. We then did a protein blastp search of the 2 peptidesagainst the Bacteria (taxid:2) with no E-value cutoff. One hundredmatched sequences in bacteria proteomes were identified and subsequentlysubmitted to the MEME tool in the MEME suite to identify consensusmotifs. The identified motif(s) are the epitope(s) from bacteria thatthe 2 peptides represent. Results are shown in table 3. One epitope (SEQID NO: 3) is identified for peptide 1 (SEQ ID NO: 1); and two epitopes(SEQ ID NO: 4 and SEQ ID NO: 5) were identified for peptide 2 (SEQ IDNO: 2). It is interesting to note that for peptide 1 (SEQ ID NO: 1),only 6 amino acids seem to be the target of bacterial specificantibodies. As for peptide 2 (SEQ ID NO: 2), the full length of thepeptide could be the target of bacterial specific antibodies. Eachepitope is matched with at least 20 sequences from the bacterialproteome, so the epitopes are broadly represented in the bacterialworld.

TABLE 3 Identified epitopes of bacteria with the 2 bacterial-viraldistinguishing peptides. Peptide 1 (SEQ ID NO: 1) has 1 epitope(SEQ ID NO: 3) with length of 6 amino acid (a.a). While peptide 2(SEQ ID NO: 2) has 2 matched epitopes (SEQ ID No. 4 andSEQ ID NO: 5) with length of 8 a.a. and 6 a.a. correspondingly.Matched part is highlighted with color in peptides. This impliesonly part of peptide 1 is identified by bacterial specificantibody while the whole sequence of peptide 2 is the targetfor bacterial antibodies. epitope 1 epitope 2 GALSRSFANVSFPGVAG RSFANV(SEQ ID NO: 1) (SEQ ID NO: 3) GLSNGASSFGKASGVAL SFGKASGV LSNGAS(SEQ ID NO: 2) (SEQ ID NO: 4) (SEQ ID NO: 5)

Example 3—Broad Bacterial Versus Non-Bacterial Infection ClassifierShows Robust Distinction and Better Performance

In clinical settings, it is possible that one will encounternonbacterial or non-viral infections, so the ability to distinguish theother types of infections is important in doing a correct diagnosis.Thus, the focus is on binary classification of bacterial vsnon-bacterial infections, because patients with bacterial infection canimmediately receive antibiotics as treatment while other infections needmore detail to arrive at a disease specific treatment. Accordingly, forexample, Chagas, malaria and Valley Fever are included as noise andcombined with viral infections as the non-bacterial infection class.

Experiments were performed as described in FIG. 7, samples are dividedinto training, validation and test set. Training and validation set areused to do feature selection and construct a model, then performance istested on the independent test set. Results were summarized in FIG. 3,panels A-C. PCA analysis (FIG. 3, panel A) and hierarchical clustering(FIG. 3, panel B) show similar separation of the two group like FIG. 2,panels A-D, suggesting performance does not deteriorate when noise isadded. Random Forest model and Neural Network model revealed amisclassification rate of 0.12 and 0.09 for the test set, which is animprovement compared with the bacterial vs viral only model. The betterperforming Neural Network model is at 0.83 sensitivity and 0.94specificity for bacteria with a Generalized RSquare of 0.73, all ofwhich is a significant increase compared to the original bacterial vsviral model. This improvement can be the result of more samples beingused for model construction, or it can be by including more types ofinfection as the non-bacteria comparison, which can lead to a morerobust bacterial specific signature.

In this experiment, we also attempted to find minimal number of peptidesthat can achieve similar performance compared with using all selectedpeptides. However, after the same stepwise regression process, the bestperformance we can get is using 5 peptides to gain a misclassificationrate of 0.23, which is not as good as the complete model using all 1000peptides. Also, the sensitivity for bacteria only was 0.44, alsosignificantly lower than the Neural Network model.

DISCUSSION

Immunosignature, a microarray-based serological test that usessemi-random peptides to splay out the antibody repertoire from infectedindividuals, is used to distinguish viral infections from bacterialinfections. Immunosignatures can detect peptides that generally separatebacterial infections from viral infections. Machine learning models wereused to identify the predictive performance of a given set of peptidesacross 169 patients, of which 105 patients have bacterial infections and64 viral infections. We achieved over 84% accuracy, 84% specificity, and83% sensitivity, and could achieve similar accuracy, specificity, andsensitivity with as few as two peptides. These two peptides wereoverrepresented in bacterial proteomes, and underrepresented in viralproteomes. Even when adding fungal and protozoan infections, highspecificity is maintained, an important goal to achieve when attemptingto reduce improperly prescribed antibiotics.

Accurate diagnosis of bacterial and viral infections is needed inclinical settings. The current imprecise diagnosis results in eitherover use of antibiotics or delayed treatment for patients. Herein is anovel diagnosis based on immunosignature technology that is able toreliably diagnose bacterial infection from viral infections. Bymeasuring the antibody response of patients with different infections,the ability to distinguish the majority of the bacterial and viralinfections has been demonstrated. We further construct models based onselected features and applying machine learning algorithms to theselected features. This model is able to classify the two types ofinfections with misclassification rate of less than 20%, exceedingcurrent methods used either in research or clinical settings. Since inclinical settings non-bacterial, non-viral infections will be expected,we also constructed a model aimed at distinguishing bacterial versus allother non-bacterial infections, consisting of viral infection and noiseinfections including Chagas, Malaria and Valley Fever. This model showseven better performance with misclassification rate at about 10%.

Several studies using gene expression profiling have shown potential todiagnose of bacterial vs viral infections. The logic behind thosestudies is genes will be differentially regulated when encounteringdifferent infections. So is it the case for antibody response. Antibodyresponse is the most direct reaction for an infection. Given the factthat genes as indirect reactions can still work to distinguishinfections types, antibody response should be an even better approachbecause of it directly targeting the pathogens. One thing worth notingis that compared with gene microarrays, where it is usually one-to-onebinding, antibodies will usually bind to multiple peptides on animmunosignature platform as long as the peptides are mimotopes of thetrue epitope. As a result, more peptides are used in analysis for theimmunosignature experiments.

Correlation of the infections are used to first test the possibility ofdistinction at the antibody system level. The logic behind usingcorrelation of infections is that the immune system might systematicallysee the difference between bacterial and viral infection by activatingdifferent pathways. Immunosignature platforms measure antibodyrepertoire in the blood. If you use all the data from the platform, thenyou are measuring the immune system. Correlation of the immune systemcan then be tested by calculating the correlation of the immunosignaturefor different pathogens. The result from the correlation offer insightsinto understanding both diagnosis and how the immune system works. Itseems the immune system is able to distinguish most bacterial and viralinfections and mount totally different immune response, since only oneinfection is misclassified. This confirms the notion that our immunesystem probably knows the source of the infection and respondaccordingly. Or perhaps the immune system does not know the source ofinfection but because all infections within the same class are sosimilar, the immune system always produces similar antibodies againstvarious bacterial infections. The same might be the case for viralinfection. As discussed above, most of the signatures that candistinguish bacterial and viral infection are bacterial specificsignatures, implying the immune system is producing various antibodiesagainst bacterial infection in ways analogous to broad-spectrumantibiotics.

And the result that influenza virus is misclassified into bacteria isinteresting because it suggests somehow influenza virus successfullytricked the immune system into thinking it is bacteria such thatantibodies against bacteria are produced, the result of which will beineffective. This is consistent with the fact that viruses are highlycontagious worldwide, implying the immune system cannot quickly mount aneffective immune response because influenza virus is regarded as“bacteria.” This misclassification by the immune system might alsoexplain why there are already pre-existing neutralizing antibodieswithin the immune system, but they were not usually elucidated duringflu infection.

Overfitting has been a major problem in microarray studies. Here weapproach the experiments with a pre-isolated test set data to avoid theproblem. The whole model construction process is without informationfrom the test set. After the model is stabilized, its performance istested with the test set data. And the results show there is littleoverfitting when migrating the model from the training, validation setto the test set.

In the bacterial versus viral infection model, we are achieving accuracyof over 80% in both classifiers tested. And clinicians can choose whichclassifier to use based on experience, since following the random forestclassifier will minimize the diagnosis of viral infection into bacterialinfection, hence lower the usage of anti-biotics, while the neuralnetwork classifier tends to balance the error rate in each class,resulting in more usage of antibiotics but less suffering of patientswho genuinely have a bacterial infection. Features being selected fromthis study are almost exclusively from bacterial infection, indicatingthere is more commonality with the immune response.

The two peptides were further examined by identifying matched sequencesfrom bacteria proteomes and then identifying consensus motifs with thematched sequences. These consensus motifs could be the binding targetwithin the two peptides on the immunosignature. Only 6 a.a. of consensusmotif in one of the peptides is identified and the full length of theother peptide is matched by bacterial antibodies. This indicates therecould be redundancy in these two peptides.

Interestingly, when non-bacterial and non-viral infections are added asthe non-bacterial class, the performance of the model actuallyincreases. Accuracy is at −90% in both classifiers. And specificity forBacteria it is −95% in both classifiers, indicating this model is goodat distinguishing non-bacterial infections. When coupled with theresults of the clustering heatmap, it appears that our immune systemsees the commonality for bacterial infections but not other types. Thisis interpreted from the classifier result that all features are bacteriaspecific features and as long as you don't have those features, you areclassified into the non-bacterial class.

In summary, we are able to construct classifiers that are betterperforming for bacterial versus viral infection. We validated each modelusing independent dataset to confirm the robustness of the model. And weare able to confirm the source of the selected features, which in turnoffers the logic for the success of the model. We believeimmunosignatures can be beneficial when used in clinical settings toboth combat the antibiotic overdose problem and to reduce suffering ofthe patients. In other words, a patient whose antibody-containing samplebinds to one or more peptides indicating a bacterial infection can betreated with antibiotics, while a patient whose antibody-containingsample binds to one or more peptides on the array indicating a viralinfection can be treated with an anti-viral medicament, such asnitazoxanide.

Materials and Methods Study Design

Serum samples were collected at various sources described in detailbelow and received at Arizona State University (ASU) under InstitutionalReview Board Protocol #0912004625, “Profiling Serum for Unique AntibodySignatures”. All samples have informed consent and were anonymized.Every disease sample was tested positive for the specified diseasebefore rendering to ASU. Bordetella pertussis samples were provided bySeracare Life Sciences (Seracare). Tuberculosis from University of Texasat El Paso (UTEP). Malaria from Seracare. HIV from Creative TestingSolutions (CTS). Flu from BioreclamationlVT. Dengue from UTEP. WNV fromCTS. VF from Sonora Lab. Chagas from CTS. Lyme from Seracare. HepatitisB from CTS, Syphilis from Seracare.

Bordetella pertussis, Lyme, Syphillis, Tuberculosis, Dengue, Flu,Hepatitis B, HIV and WNV samples are used in the bacterial versus viralexperiment. Chagas, Malaria and Valley Fever were added in the bacterialversus non-bacterial experiment. All samples are randomly assigned intotraining, validation and test set with equal probability.

Immunosignature Assay

Serum samples were diluted 1:1500 into the sample buffer (3% BSA in1×PBST) before incubated on immunosignature microarrays at a finalvolume of 150 ul for 1 h at 37° C. with rotating. Primary antibodiesfrom the serum were then washed with 1×PBST for 3 times and rinsed withddH₂O for 3 times. 4 nM Secondary anti-human IgG antibodies withAlexa-Fluor 555 conjugation from Life Technologies are added insecondary incubation buffer (0.75% Casein in 1×PBST with 0.05% Tween20)to detect primary antibody binding. Secondary antibodies were incubatedon the array for 1 h at 37° C. before washed off with blocking buffer.Slides were then washed with 1×PBST and ddH₂O before drying. Images wereobtained from scanning arrays at 555 nm using Innoscan 910 scanner.Signal intensity for features were extracted using GenePix Pro 6.0.

Statistical Analysis

Analysis is performed using scripts written in R or the JMP software(SAS Institute Inc.). Raw intensity reads for all samples are normalizedto the median per sample. Quality Control (QC) for the samples isperformed by checking each sample's average correlation against allother samples. Samples with correlation <0.2 are deleted. 226 samplesare run on immunosignature and 212 samples passed QC and were analyzed.

Feature selection is done by using samples in the training andvalidation set. Two-tail Student's T-Test is performed for each peptidesby comparing bacterial infection samples versus viral infection samples(non-bacterial infection samples). Cutoff is controlled at allowing 1false positive for all test, which is 1/124,000 or 1000 peptides,whichever is smaller.

PCA is performed using selected peptides with all samples, with the testset samples highlighted in right PCA plot. Hierarchical clustering isperformed using the selected peptides with all samples. Ward method isused in calculating the distance between the samples. The same method isused in calculating distance for the features in two-way clustering.

Random Forest is carried out with maximum 100 trees in the forest.Minimum split per tree is set at 10 and maximum at 2000. Early stoppingrule is applied on validation set. And performance of the classifier isevaluated and output as confusion matrix for the training, validationand test set. Neural Network is built with one hidden layer and 3 nodes,with Tan H as the activation function.

Stepwise regression for reducing number of features is used withstopping rule of p-value cutoff at 0.1 for both entering and leaving themodel. The model starts empty with no feature. Features become includedin the model if below cutoff p-value and will be removed from the modelonce p-value larger than the cutoff. This process is done recursivelyuntil the model stabilize, with no feature entering and leaving themodel. Then the selected features are tuned to maximum RSquare for thevalidation set. Then Logistic regression is used in building model withthe 2 selected peptides.

Blast search of the 2 peptides is done using the NCBI blast server.Protein Blast (blastp) suite is used. Database is Reference proteins andorganism is limited to Bacteria (taxid:2). Algorithm parameters is setto adjust for short sequences, and max target sequences at 100. Then thematched sequences are processed to contain only linear matched part. The100 matched sequences are imported into MEME suite to identify epitopes,with configurations of 10 minimum sites per epitope and 3 maximumepitopes.

While the preferred embodiments have been illustrated in detail, itshould be apparent that modifications and adaptations to thoseembodiments may occur to one skilled in the art without departing fromthe scope of the present claims.

What is claimed is:
 1. An array comprising at least two peptides thatare capable of differentially binding to one or more antibodies producedin response to a bacterial infection or to one or more antibodiesproduced in response to a viral infection.